Skip to content

[DRAFT] NETOBSERV-2284 FLP metrics cache optimization#1243

Closed
jpinsonneau wants to merge 4 commits into
netobserv:mainfrom
jpinsonneau:2284
Closed

[DRAFT] NETOBSERV-2284 FLP metrics cache optimization#1243
jpinsonneau wants to merge 4 commits into
netobserv:mainfrom
jpinsonneau:2284

Conversation

@jpinsonneau
Copy link
Copy Markdown
Member

@jpinsonneau jpinsonneau commented Apr 14, 2026

Description

Implement TTL support for metrics. See upstream proposal: prometheus/client_golang#1983

Benchmark local run:

Aspect Old (TimedCache) New (Vec TTL)
Throughput ~2,730 ns/op, 8 allocs ~2,800 ns/op, 8 allocs
Memory @ 10K series 29,504 KB (3,021 B/series) 22,504 KB (2,304 B/series)
Memory savings ~24% less
Cleanup @ 1K series 325 µs 227 µs (1.4x faster)
Cleanup @ 10K series 2.79 ms 2.72 ms (~same)

6 nodes rosa cluster with all features enabled and default metrics:

Stock Image (TimedCache) — 3 pods, 10 min steady state

Pod Heap (avg 5m) Series (avg 5m) CPU (5m rate) GC (ms/s) RSS
mmvrt 54.7 MB 4509 123.1m 0.305 95.5 MB
cvl7f 35.2 MB 2158 40.5m 0.121 107.5 MB
88hpb 29.2 MB 1658 15.5m 0.019 80.0 MB
Total 119.1 MB 8325 179.1m 0.445 283.0 MB

TTL Image (Vec-native TTL, no TimedCache) — 3 pods, 10 min steady state

Pod Heap (avg 5m) Series (avg 5m) CPU (5m rate) GC (ms/s) RSS
jhpg7 40.4 MB 2916 59.2m 0.106 101.5 MB
psvh4 35.7 MB 2512 59.1m 0.169 96.5 MB
px8rx 21.3 MB 1335 25.0m 0.066 38.0 MB
Total 97.4 MB 6763 143.3m 0.341 236.0 MB

Normalized Comparison

Metric Stock (TimedCache) TTL (Vec-native) Delta
Per-series heap overhead ~5.3 KB/series ~3.3 KB/series -38%
GC pressure (busiest pod) 0.305 ms/s 0.169 ms/s -45%
CPU per series ~21.5 µcores/series ~21.2 µcores/series ~same

6 nodes rosa cluster with all features enabled and 34 metrics enabled (high cardinality)

High Cardinality Comparison (all 34 metrics enabled)

Stock Image (TimedCache) — 3 pods, 10 min steady state

Pod Heap Series CPU (5m) GC (ms/s) RSS
lpb4c 80.2 MB 7435 111.3m 0.247 106.5 MB
98rvh 72.1 MB 6428 83.9m 0.050 155.3 MB
8mkf6 50.2 MB 4616 54.8m 0.074 123.1 MB
Total 202.5 MB 18479 250.0m 0.371 384.9 MB

TTL Image (Vec-native TTL) — 3 pods, 10 min steady state

Pod Heap Series CPU (5m) GC (ms/s) RSS
bn76q 41.5 MB 9572 69.0m 0.079 132.7 MB
9w982 60.7 MB 8293 130.5m 0.215 123.4 MB
frhwr 44.5 MB 7560 21.9m 0.072 87.4 MB
Total 146.7 MB 25425 221.4m 0.366 343.5 MB

Normalized

Metric Stock TTL Delta
Total heap (3 pods) 202.5 MB 146.7 MB -27.6%
Total series 18479 25425 +37.6%
Per-series heap ~11.0 KB ~5.8 KB -47%
Total RSS 384.9 MB 343.5 MB -10.8%
Total CPU 250.0m 221.4m ~same
GC pressure 0.371 ms/s 0.366 ms/s ~same

Assisted-by: claude-4.6-opus-high

Dependencies

Requires prom lib upstream changes / prometheus/client_golang@2444fef fork

Checklist

  • Does the changes in PR need specific configuration or environment set up for testing?
    • if so please describe it in PR description.
  • I have added thorough unit tests for the change.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

To run a perfscale test, comment with: /test flp-node-density-heavy-25nodes

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 14, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dbbff597-6118-46f0-b96a-a832e94e77ce

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 14, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign oliviercazade for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jpinsonneau
Copy link
Copy Markdown
Member Author

/test flp-node-density-heavy-25nodes

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 14, 2026

@jpinsonneau: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/flp-node-density-heavy-25nodes 364e59d link true /test flp-node-density-heavy-25nodes

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jpinsonneau jpinsonneau requested a review from jotak April 17, 2026 09:08
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 17, 2026

Codecov Report

❌ Patch coverage is 80.82192% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.27%. Comparing base (8c1640d) to head (4fcc3f0).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
pkg/pipeline/encode/metrics_common.go 78.78% 11 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1243      +/-   ##
==========================================
+ Coverage   66.25%   66.27%   +0.02%     
==========================================
  Files         121      121              
  Lines        7983     8030      +47     
==========================================
+ Hits         5289     5322      +33     
- Misses       2344     2355      +11     
- Partials      350      353       +3     
Flag Coverage Δ
unittests 66.27% <80.82%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pkg/pipeline/encode/encode_prom.go 49.72% <100.00%> (-1.59%) ⬇️
pkg/pipeline/encode/metrics_common.go 83.41% <78.78%> (-2.40%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jpinsonneau
Copy link
Copy Markdown
Member Author

Closing in favor of #1247

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant